hi SteepDeer88
did you managed to get rid of your issue ?
it is basically auto-generated when you do clearml-init
there are a bench of optional configurations that are not in the auto generated file though.
Have a look here it is pretty detailed https://clear.ml/docs/latest/docs/configs/clearml_conf
you are in a regular execution - i mean not a local one. So the different pipeline tasks has been enqueued. You simply need to fire an agent to pull the enqueued tasks. I would advice you to specify the queue in the steps (parameter execution_queue ).
You then fire your agent :
clearml-agent daemon --queue my_queue
Agent is a process that pulls task from a queue and assigns ressources (worker) to them. In the pipeline, when not runned locally, steps are enqueued tasks
can you share with me an example or part from your code ? I might miss something in wht you intend to achieve
When the pipeline or any step is executed, a task is created, and it name will be taken from the decorator parameters. Additionally, for a step, the name parameter is optional : if not provided, the function name will be used instead.
It seems to me that your script fails creating the pipeline controller task because it fails pulling the name parameter. which is weird ... Weird because in the last error line, we can see that name !
btw Ofir, can you sent me your different clearml packages versions ?
those are the credentials you got from your self hosted server ?
what about the logs before the error ? i think it relevant to have them all. i try to isolate the error, and to understand if it comes from the cred, the servers addresses, a file error or a network error
Interesting. We are opening a discussion to weight the pros and cons of those different approaches - i ll of course keep you updated>
Could you please open a github issue abot that topic ? 🙏
http://github.com/allegroai/clearml/issues
hey H4dr1en
you just specify the packages that you want to be installed (no need to specify the dependancies) and the version if needed.
Something like :
pytorch==1.10.0
can you please provide the apiserver log and the elasticsearch log?
hey
"when cloning an experiment via the WebUI, shouldn't the cloned experiment have the original experiment as a parent? It seems to be empty"
you are right, i think there is a bug here. We will release a fix asap 🙂
Hey UnevenDolphin73
I have tried to reproduce the issue but with no success. I manage to auto report any graph designed according to your description - values between [0,50] and sudden extreme values. So far everything seems to be ok on my side. have you found something new reguarding this issue ? Could you send me more details on the graph which reporting hangs ?
Thanks
btw here is the content of the imported file:
import
torch
from
torchvision
import
datasets, transforms
import
os
MY_GLOBAL_VAR = 32
def my_dataloder
():
return
torch.utils.data.DataLoader(
datasets.MNIST(os.path.join('./', 'data'), train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor()
` ...
No, it is supposed to have its status updated automatically. We may have a bug. Can you share some example code with me, so that i could try to figure out what is happening here ?
Hi,
We are going to try to reproduce this issue and will update you asap
yes everything that is downloaded is cached. The cache folder is in your config file :
` sdk {
# ClearML - default SDK configuration
storage {
cache {
# Defaults to system temp folder / cache
default_base_dir: "~/.clearml/cache"
size {
# max_used_bytes = -1
min_free_bytes = 10GB
# cleanup_margin_percent = 5%
}
}
direct_access: [
# Objects matching are...
for instance
export CLEARML_AGENT__AGENT__PACKAGE_MANAGER_ TYPE=conda && clearml-agent daemon --queue my queue
Hi MoodySparrow34
We have an user that wrote this example https://github.com/marekcygan/clearml-slurm-workers
It is a simple glue code to spin SLURM workers when the tasks are enqueued. Hope it will help
Do you think that you could send us a bit of code in order to better understand how to reproduce the bug ? In particular about how you use dotenv...
So far, something like that is working normally. with both clearml 1.3.2 & 1.4.0
`
task = Task.init(project_name=project_name, task_name=task_name)
img_path = os.path.normpath("**/Images")
img_path = os.path.join(img_path, "*.png")
print("==> Uploading to Azure")
remote_url = "azure://****.blob.core.windows.net/*****/"
StorageManager.uplo...
Hi UnevenDolphin73
The difference between v1.3.2 and v1.4.x (about download_folder) is that in 1.4.x, the subfolder structure is maintened, so the .env file would not be downloaded directly into the provided local folder (hence "./") if it is not into the bucket's main folder. The function will reproduce the subdir structure of the bucket. So you will need to specify to load_env() the path to the .env file (full path, including the env filename)
For example, if i do :
` StorageManager.down...
but in the other hand, when you parse your minio console, you have all the buckets shown as directories right ? there is no file in the root dir. So we used the same logic and decided to reproduce that very same structure. Thus when you will parse the local_folder, you will have the same structure as shown in the console
JuicyFox94
can you try again after having upgraded to 3.6.2 ?
hey ApprehensiveSeahorse83
can you please check that the trigger is correctly added ? Simply retrieve the return value of add_task_triggerres = trigger.add_task_trigger( .....
print(f'Trigger correctly added ? {res}')
Hi EnormousWorm79
The Pycharm testrunner wraps the script into a local cript, and thats what you are getting.
(jb pytest runner). Because it is local, you lose the source info
Let me check if I have a workaround or solution for you. I keep you updated