Reputation
Badges 1
25 × Eureka!Hi @<1556450111259676672:profile|PlainSeaurchin97>
While testing the migration, we found that all of our models had their
MODEL URL
set to the IP of the old server.
Yes all the artifacts/models/debug-samples are stored "as is" , this means that if you configured your original setup with IP, it is kind of stuck there, this is why it is always preferred to use host-name ...
you apparently also need to rename
all
model URLs
Yes š
HI QuizzicalDove0
I guess the reason is that the idea is integration is literally 2 lines, and it will take less time to execute the code on a system with working env (we assume there is one) then to configure all the git , python packages, arguments etc...
All that said you can create an experiment from code , using Task.import_task https://allegro.ai/docs/task.html#trains.task.Task.import_task
BroadMole98
I'm still exploring what trains is for.
I guess you can think of Trains as Experiment manager + MLOps tied together.
The idea is to give a quick and easy way to move from coding/running on one machine to scaling it to multiple remote machines, with everything that comes with it.
In some ways it is like snakemake, it setups your environment and execute the code. Snakemake also allows you to setup data, which in Trains is done via code (StorageManager), pipelines are also...
Hi SteadyFox10 , this one will get all the last metric scalarstrain_logger.get_last_scalar_metrics()
Hey GiganticTurtle0 ,
So basically the issue is the the pipeline function ( prediction_service
) is getting a dict as input, and it is expecting to get basic types... if you were to do the following, it would have worked as expected.prediction_service(**default_config)
I will make sure we flatten any dictionary so that we end up with config/start
, instead of a serialized version of the dict.
wdyt?
OddAlligator72 quick question:
suggest that you implement a simple entry-point API
How would the system get the correct packages / git repo / arguments if you are only passing a single function entrypoint ?
5 seconds will be a sleep between two consecutive pulls where there are no jobs to process, why would you increase it to a higher pull freq ?
The issue only arises upon sending Images. (Both numpy, mpl and PIL)
BTW: they should appear under debug-samples
Tab in the results
Hi SubstantialElk6 I'll start at the end, you can run your code directly on the remote GPU machine š
See clearml-task
documentation, on how to create a task from existing code and launch it
https://github.com/allegroai/clearml/blob/master/docs/clearml-task.md
That said, the idea is that you add the Task.init
call when you are writing/coding the code itself, then later when you want to run it remotely you already have everything defined in the UI.
Make sense ?
MiniatureCrocodile39 from the screen shot I imagine you are running inside a docker, this means that when you restart the docker, the configuration file is lost.
Could that be the case ?
Is there any way to get just one dataset folder of a Dataset? e.g. only "train" or only "dev"?
They are usually stored in the same "zip" so basically you have to download both folders anyhow, but I guess if this saves space we could add this functionality, wdyt?
Hmm you either need to run with SUDO or make sure the running user has docker run permissions
So this is why š
an agent can only run one Task at a time.
The HPO (being a Task on its own) should run on the "services" queue, where the agent can run multiple "cpu controller" Tasks like the HPO.
Make sense ?
Hi @<1561885921379356672:profile|GorgeousPuppy74>
Please use threads to ask questions, so we keep everything tidy
(and if you can please remove your first message, and merge it with the above one, this one and edit this one, for better readability)
regrading the issue, you need to either have clearm.conf in your Home folder, I'm assuming thisis /root/
not /home/ubuntu/.
Also not sure why you need to expose ports...
And after having called
Task.init()
the second time, the automatic logging of resources and tensorboard plots works as well. I would recommend adding explanation to the docs for
Oh yeah! you always need to call Task.init first, Task,current_task should be called from anywhere you like but after the Task.init was called.
Hi @<1720249416255803392:profile|IdealMole15>
I'm assuming you mean on a remote machine with clearml-agent running ?
If you do, then you either use clearml-task
to create a Task (Job) and specify the container and script. or click on "Create New Experiment" in the UI, and fill out the git repo / script and specify the docker image.
Make sense?
Hi JitteryCoyote63 , let me check, this backwards compatibility might only apply for API version mismatch between the client and server.
Hi CostlyElephant1
What do you mean by "delete raw data"? Data is always fetched to cached folders and clearml takes care of cache cleanup
That said notice that get mutable copy is a target you specify, in this case you should definetly delete after usage. Wdyt ?
SteadyFox10 With pleasure š
BTW: you can retrieve the Task id from its name withTask.get_tasks(project_name='my project', task_name='my task name')
See https://allegro.ai/docs/task.html?highlight=get_tasks#trains.task.Task.get_tasks
Could you test with the latest "cleaml"pip install git+
Task.add_requirement(".") should be supported now š
Fixing that would make this feature great.
Hmm, I guess that is doable, this is a good point, search for the GUID is not always trivial (or maybe at least we can put in the description the project/dataset/version )
Iām not sure if this was solved, but I am encountering a similar issue.
Yep, it was solved (I think v1.7+)
With
spawn
and
forkserver
(which is used in the script above) ClearML is not able to automatically capture PyTorch scalars and artifacts.
The "trick" is to have Task.init before you spawn your code, then (since your code will not start from the same state), you should call Task.current_task(), which would basically make sure everything is...
Hi @<1628565287957696512:profile|AloofBat92>
Yeah the name is confusing, we should probably change that. The idea is it is a low code / high code , train your own LLM and deploy it. Not really chatgpt 1:1 comparison, more like, GenAI for enterprises. make sense ?
Since I'm assuming there is no actual task to run, and you do not need to setup the environment (is that correct?)
you can do:$ CLEARML_OFFLINE_MODE=1 python3 my_main.py
wdyt?
DepressedChimpanzee34 what would be easier curl
or python ?
set the following:CLEARML_AGENT_DISABLE_SSH_MOUNT=1 clearml-agent daemon ...
The issue is, it will automatically mount the .ssh of the host into the container, so that if you are using SSH to clone git you have credentials, in your case, it also mounts the configuration, hence failing to login.
I will make sure we add it to the configuration file, so it is more visible
, is the team open to PRs from external people?
Yes please do! PRs are welcomed! I thought we fixed the GitHub readme to reflect it, anyhow I'll make sure we do š
EnviousStarfish54
Can you check with the latest clearml from github?pip install git+