Reputation
Badges 1
27 × Eureka!I’m curious what the opinions are on this! I asked myself the same question. In my limited experience, going through a workflow with SageMaker was a painful process, and one that required a ton of AWS-specific code and configuration. Compared to this, ClearML was easy and quick to set up, and provides a dashboard where everything from experiments to models to output is organised, queryable and comparable. Way less hassle for way more benefits.
Hi Jake! The clearml.conf file content is exactly the api section that is given by our clearml server, copied using the copy button, something like
api {
web_server: http:// .. :8080
api_server: http:// .. :8008
files_server: http:// .. :8081
credentials {
"access_key" = "KEY"
"secret_key" = "SECRET"
}
}
clearml version 1.9.0
The strange thing is that the configuration works perfectly on my machine. My coworker’s machine does have a different p...
Switched off the windows defender FW, no load balancer present, still not working 😕
So we got it! Still don’t understand it though.
I generated the credentials on the web ui and sent them to my coworker, they did not work at all.
He generated his own credentials and they work!
Still have no clue, something going wrong when reading the file due to certain encoding? Due to windows? Or maybe python?
Personally I’ve found this (sort-of hacky) approach to work, by passing your git credentials as environment variables to the agent’s docker and cloning the repo in the code. You’ll have to make sure you have the right packages installed though.
` if 'GIT_USER' in os.environ:
git_name, git_pass = os.environ['GIT_USER'], os.environ['GIT_PASS']
call(f'git clone https://{git_name}:{git_pass}@gitlab.com/myuser/myrepo', shell=True)
global myrepo
from myrepo import func
elif local_re...
Thanks a lot! I’m still in the process of setting up, so running on a remote worker has not been successful yet, but I’ll report back on this issue if that fixes it!
Additionally, I have found it helpful to take a look into the agent’s working directory. With the python error should be the location of the script, and it may tell you a bit more by browsing that directory
Just to be sure, you could download the repo and put this script in the root, and use the PipelineDecorator.debug_pipeline()
option to run it locally and see if the code works like you wanted 🙂
If I use the PipelineDecorator.debug_pipeline()
everything works as expected
Just checked and it’s not there, even for the successfully-remotely-ran pipeline. Do note that the needed module is just a local folder with scripts. The differences between the successful pipeline (ran locally and cloned in the UI) vs the errored pipeline (ran remotely) are also very hard to spot to me, they have the exact same Installed Packages and execution details
Hmm that sounds okay to me, could you send the clearml log with the ‘No module named ..’ error?
Hi Chingiz! Is the LIBRARY_IN_REPO in the root of the repo? How do you run this pipeline (with run_locally, debug_pipeline or with an agent)? And lastly, have you checked the clearml logs to see if the repo was correctly pulled?
I ask these questions because the pythonpath is the root of the repo and the repo can only be used when running the pipeline with an agent, IIRC
Reporting back: this example worked, but unfortunately did not run successfully when cloned in the UI, with an error of base_task_id is empty
akin to https://clearml.slack.com/archives/CTK20V944/p1662954750025219 previous slack thread. By editing the configuration object as mentioned above (programmatically also possible with the get and set configuration objects), the pipeline also worked when cloned 🙂
I’m also not sure but it seems like the slack trial renews from time to time in this workspace, which eventually gives access to those older threads
Hi Mark! Do you set any of the decorator parameters using variables? That was my issue, and instead of using python variables, I hardcoded one potential value, and then used the get and set methods to change them when cloning programatically, which should be the same as changing them in the configuration tab when cloning with the UI. Hope this helps 🙂
I think I got it! I found that the branch for the component is specified in the UI in the component’s configuration object under the pipeline’s configuration tab. In theory I should be able to clone the pipeline task, use the get_configuration_object
method, change the branch, set it using the set_configuration_object
, and finally enqueue! Going to test this out
Would this then be possible by cloning the task (which is a pipeline) and accessing the right subtask (the component which should be changed)?
Awesome! Really simple and clever, love it. Thanks Eugen!
I’ve seen that you can change the branch of a cloned task like so https://github.com/allegroai/clearml-actions-train-model/blob/7f47f16b438a4b05b91537f88e8813182f39f1fe/train_model.py#L14
It seems like the actual import statement worked, since there is no ‘ImportError: no module named x’
It seems to be working now, by running the Pipeline locally with PipelineDecorator.run_locally()
and running the script using the following command:PYTHONPATH="fill_in_your_current_dir" python pipeline.py
Cloning this in the UI and enqueueing now also allows remote execution.
Calling the script without the PipelineDecorator.run_locally()
i.e. running the pipeline remotely still gives the ModuleNotFoundError: No module named
The web ui is hosted at :8080, so make sure to add that to the end of the url as well 🙂
Yes, also present in the git repo (hosted on gitlab and seemed to correctly retrieve it, couldn’t find any errors about this in the logs)
This is ran by using the UI’s ‘Run’ button without the ‘Advanced configuration’
Happens to all! Importing of local packages in these decorated pipelines hasn’t really worked yet (except when running via Pycharm, which seems to make sure that the location of the original code is always in the path)
In any case, I’m happy it’s fully running now 🙂
Hi AgitatedDove14
My bad, I worded my question wrong I see, I meant the tasks of the pipeline’s components. (it shows that I’m a newbie 😅 )
This does make perfect sense though! The problem seems to just be that the components themselves are ran on the same queue as the pipeline logic, even though I configured it differently
Oh yup, that seems very possible since I run it with the run_locally()
and then clone this task in the UI